73 research outputs found
Dual Progressive Transformations for Weakly Supervised Semantic Segmentation
Weakly supervised semantic segmentation (WSSS), which aims to mine the object
regions by merely using class-level labels, is a challenging task in computer
vision. The current state-of-the-art CNN-based methods usually adopt
Class-Activation-Maps (CAMs) to highlight the potential areas of the object,
however, they may suffer from the part-activated issues. To this end, we try an
early attempt to explore the global feature attention mechanism of vision
transformer in WSSS task. However, since the transformer lacks the inductive
bias as in CNN models, it can not boost the performance directly and may yield
the over-activated problems. To tackle these drawbacks, we propose a
Convolutional Neural Networks Refined Transformer (CRT) to mine a globally
complete and locally accurate class activation maps in this paper. To validate
the effectiveness of our proposed method, extensive experiments are conducted
on PASCAL VOC 2012 and CUB-200-2011 datasets. Experimental evaluations show
that our proposed CRT achieves the new state-of-the-art performance on both the
weakly supervised semantic segmentation task the weakly supervised object
localization task, which outperform others by a large margin
Semantic-Constraint Matching Transformer for Weakly Supervised Object Localization
Weakly supervised object localization (WSOL) strives to learn to localize
objects with only image-level supervision. Due to the local receptive fields
generated by convolution operations, previous CNN-based methods suffer from
partial activation issues, concentrating on the object's discriminative part
instead of the entire entity scope. Benefiting from the capability of the
self-attention mechanism to acquire long-range feature dependencies, Vision
Transformer has been recently applied to alleviate the local activation
drawbacks. However, since the transformer lacks the inductive localization bias
that are inherent in CNNs, it may cause a divergent activation problem
resulting in an uncertain distinction between foreground and background. In
this work, we proposed a novel Semantic-Constraint Matching Network (SCMN) via
a transformer to converge on the divergent activation. Specifically, we first
propose a local patch shuffle strategy to construct the image pairs, disrupting
local patches while guaranteeing global consistency. The paired images that
contain the common object in spatial are then fed into the Siamese network
encoder. We further design a semantic-constraint matching module, which aims to
mine the co-object part by matching the coarse class activation maps (CAMs)
extracted from the pair images, thus implicitly guiding and calibrating the
transformer network to alleviate the divergent activation. Extensive
experimental results conducted on two challenging benchmarks, including
CUB-200-2011 and ILSVRC datasets show that our method can achieve the new
state-of-the-art performance and outperform the previous method by a large
margin
LeCaRDv2: A Large-Scale Chinese Legal Case Retrieval Dataset
As an important component of intelligent legal systems, legal case retrieval
plays a critical role in ensuring judicial justice and fairness. However, the
development of legal case retrieval technologies in the Chinese legal system is
restricted by three problems in existing datasets: limited data size, narrow
definitions of legal relevance, and naive candidate pooling strategies used in
data sampling. To alleviate these issues, we introduce LeCaRDv2, a large-scale
Legal Case Retrieval Dataset (version 2). It consists of 800 queries and 55,192
candidates extracted from 4.3 million criminal case documents. To the best of
our knowledge, LeCaRDv2 is one of the largest Chinese legal case retrieval
datasets, providing extensive coverage of criminal charges. Additionally, we
enrich the existing relevance criteria by considering three key aspects:
characterization, penalty, procedure. This comprehensive criteria enriches the
dataset and may provides a more holistic perspective. Furthermore, we propose a
two-level candidate set pooling strategy that effectively identify potential
candidates for each query case. It's important to note that all cases in the
dataset have been annotated by multiple legal experts specializing in criminal
law. Their expertise ensures the accuracy and reliability of the annotations.
We evaluate several state-of-the-art retrieval models at LeCaRDv2,
demonstrating that there is still significant room for improvement in legal
case retrieval. The details of LeCaRDv2 can be found at the anonymous website
https://github.com/anonymous1113243/LeCaRDv2
WMFormer++: Nested Transformer for Visible Watermark Removal via Implict Joint Learning
Watermarking serves as a widely adopted approach to safeguard media
copyright. In parallel, the research focus has extended to watermark removal
techniques, offering an adversarial means to enhance watermark robustness and
foster advancements in the watermarking field. Existing watermark removal
methods mainly rely on UNet with task-specific decoder branches--one for
watermark localization and the other for background image restoration. However,
watermark localization and background restoration are not isolated tasks;
precise watermark localization inherently implies regions necessitating
restoration, and the background restoration process contributes to more
accurate watermark localization. To holistically integrate information from
both branches, we introduce an implicit joint learning paradigm. This empowers
the network to autonomously navigate the flow of information between implicit
branches through a gate mechanism. Furthermore, we employ cross-channel
attention to facilitate local detail restoration and holistic structural
comprehension, while harnessing nested structures to integrate multi-scale
information. Extensive experiments are conducted on various challenging
benchmarks to validate the effectiveness of our proposed method. The results
demonstrate our approach's remarkable superiority, surpassing existing
state-of-the-art methods by a large margin
- …